From Images to Signals: Are Large Vision Models Useful for Time Series Analysis?

A Systematic Evaluation of LVMs on Classification and Forecasting Tasks in Time Series

Published

May 29, 2025

Authors: Z. Zhao et al.
Published on Arxiv: 2025-05-29
Link: http://arxiv.org/abs/2505.24030v1
Institutions: University of Houston • University of Illinois at Urbana-Champaign • University of Connecticut • Squirrel Ai Learning
Keywords: Large Vision Models, Vision Transformer (ViT), Swin Transformer, Masked Autoencoders (MAE), SimMIM, Time Series Analysis, Time Series Classification, Time Series Forecasting, Imaging methods, Self-supervised learning, Transfer learning, Ablation study, Multimodal learning, Foundation models, Benchmark datasets, Inductive bias

Random Unsplash-style image

Large Vision Models (LVMs) such as ViT, Swin, MAE, and SimMIM have shown impressive results in computer vision tasks, sparking interest in applying these models to other domains like time series analysis. Traditionally, transformers and Large Language Models (LLMs) have been explored for time series with mixed success, and mapping time series data to images offers a way to leverage pretrained vision models for non-visual problems. The trend towards multimodal and foundational models in time series research boosts the relevance of this approach for both classification and forecasting tasks.

To address these challenges, the authors propose and systematically evaluate several approaches and contributions:

Following the detailed methodological setup, the study reports key experimental findings:

In light of these results, the researchers draw several important conclusions and outline future directions: